feat(aggregation): Add GradVac aggregator#638
feat(aggregation): Add GradVac aggregator#638rkhosrowshahi wants to merge 11 commits intoSimplexLab:mainfrom
Conversation
Implement Gradient Vaccine (ICLR 2021) as a stateful Jacobian aggregator. Support group_type 0 (whole model), 1 (all_layer via encoder), and 2 (all_matrix via shared_params), with DEFAULT_GRADVAC_EPS and configurable eps. Add Sphinx page and unit tests. Autogram is not supported; use torch.manual_seed for reproducible task shuffle order. Made-with: Cursor
|
Hey @ValerianRey : |
|
I think the link check action failed because of bad luck, re-running the job fixed it. |
|
Thanks a lot for the PR, I'll look at it tonight or tomorrow! |
|
/opencode:Plan can you please check that the actual implementation (from line 219 to line 251 of _gradvac.py) is equivalent to the implementation in https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/GradVac.py? Please report anything that differs. |
|
Let me analyze the differences between the two implementations: Comparison SummaryAfter comparing lines 219-251 of ✅ Equivalent Behaviors
🔍 Differences Found
📝 Notes
Conclusion: The implementations are algorithmically equivalent. TorchJD includes minor improvements (configurable eps, numerical clamp) and follows the codebase's design patterns. |
|
/opencode:Plan Please review this PR in detail. |
This comment was marked as low quality.
This comment was marked as low quality.
There was a problem hiding this comment.
Really good work, tyvm! This seems almost ready to merge IMO. Just a bunch of nitpicks that should be really easy to fix, and 1 somewhat important implementation question (iterating over groups in outer loop vs inner loop).
I would also argue that it seems that this could be implemented as a Weighting internally (because we actually act on norms and cosine similarities between gradients, which is what the gramian contains). Also, it's possible to keep track of norms and cosine similarities between projected gradients even if we don't have those gradients, just by making some operations on the gramian. This is what we did to implement PCGrad as a Weighting.
For example, imagine you have g1 and g2 be two gradients. From the gramian, you know ||g1||, ||g2|| (the sqrt of the diag elements), and g1 . g2 (an off-diag element), so you can deduce cos(g1, g2) from that.
If you compute g1' = g1 + w * g2, you can also directly deduce the norm of g1':
||g1'||² = ||g1||² + w² ||g2||² + 2w g1 . g2 (all elements of the right handside are known).
Similarly, you can compute g1' . g2 = (g1 + w * g2) . g2 = g1 . g2 + w g1 . g2.
So even after projection, you still know the dot products between all of your gradients, meaning that you still know the "new" gramian.
I didn't think through it entirely but at a first glance it seems possible to adapt this as a weighting, because of that. The implementation may even be faster actually (because we have fewer norms to recompute). But it may be hard to implement, so IMO we should merge this without even trying to implement it as a Weighting, and we can always improve later. @PierreQuinton what do you think about that?
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
- Use group_type "whole_model" | "all_layer" | "all_matrix" instead of 0/1/2 - Remove DEFAULT_GRADVAC_EPS from the public API; keep default 1e-8; allow eps=0 - Validate beta via setter; tighten GradVac repr/str expectations - Fix all_layer leaf sizing via children() and parameters() instead of private fields - Trim redundant GradVac.rst prose; align docs with the new API - Tests: GradVac cases, value regression with torch.manual_seed for GradVac - Plotter: factory dict + fresh aggregator instances per update; legend from selected keys; MathJax labels and live angle/length readouts in the sidebar This commit includes GradVac implementation with Aggregator class.
…hting GradVac only needs gradient norms and dot products, which are fully determined by the Gramian. This makes GradVac compatible with the autogram path. - Remove grouping parameters (group_type, encoder, shared_params) from GradVac - Export GradVacWeighting publicly
66c99db to
6fbc7b8
Compare
Seed is already set to 0 because of the autoused fix_randomness fixture declared in conftest.py
|
I think this is ready to merge, except for some plotting things. Can we remove the changes to the plotter and make plotter improvements in a different PR (except adding GradVac to the list of aggregators in the plotter)? I see a few issues in the plotter changes, and I'd rather merge this PR now and make the rest of the changes in a different PR. @rkhosrowshahi BTW the link check action will fail because the links I added in the readme point to some documentation that will only be built after we merge this. |


Summary
Adds Gradient Vaccine (GradVac) from ICLR 2021 as a stateful
Aggregatoron the full task Jacobian.Behavior
\bar{\rho}, with the closed-form vaccine update when\rho < \bar{\rho}.group_type:0whole model (single block);1all_layer viaencoder(leaf modules with parameters);2all_matrix viashared_params(one block per tensor, iteration order = Jacobian column order).DEFAULT_GRADVAC_EPSand configurableeps(constructor + mutable attribute).torch.randperm; usetorch.manual_seedfor reproducibility.Files
src/torchjd/aggregation/_gradvac.py, export in__init__.pydocs/source/docs/aggregation/gradvac.rst+ index toctreetests/unit/aggregation/test_gradvac.pyVerification
ruff format/ruff checkon touched pathsty checkon_gradvac.pypytest tests/unit/aggregation/test_gradvac.py tests/unit/aggregation/test_values.py -W error